On Class-probability Estimates and Cost-sensitive Evaluation of Classiiers 1. Class-probability Estimates
نویسنده
چکیده
This paper addresses two cost-sensitive learning methodology issues. First, we ask the question of whether Bagging is always an appropriate procedure to compute accurate class-probability estimates for cost-sensitive classiication. Second, we will point the reader to a potential source of erroneous results in the most common procedure of evaluating cost-sensitive classiiers when the real misclassiication costs are unknown. The paper concludes with an experimental comparison of MetaCost and BagCost, a procedure that labels unseen examples based their class probability estimates. Let hx i ; y i i be a set of training instances, and Y = fy 1 ; y 2 ; : : : ; y k g be the set of possible labels. Let us assume that the misclassiication costs are static and are described by a k k cost matrix C , with C (i; j) specifying the cost incurred when an example is predicted to be in class i when in fact it belongs to class j. If we denote with P (yjx) the class probability of an arbitrary example x, then the optimal decision for x is given by (Duda & Hart, 1973): h(x) = argmin y2Y k X j=1 P (y j jx)C(y; y j); (1) This gives us a cost-sensitive classiication procedure whose performance relies on the accuracy of the computed class-probability estimates. MetaCost (Domingos, 1999) is an algorithm that re-labels each training example with the cost-optimal label (computed according to (1)), and outputs the decision of a 0/1-loss classiier trained on the relabeled data. MetaCost uses Bagging (Breiman, 1996) to compute the class probability estimates. Bagging builds an ensemble of classiiers by training the same learning algorithm on a series of bootstrap samples of the training data. The class probability P (y i jx) is estimated by fraction of votes (each classiier in the ensemble outputs itself a vote between 0 and 1). In the meantime, a careful analysis of Bagging shows that these votes represent a measure of the variance of the base classiier, and this is diierent from the probability that we want to estimate (which is a parameter of the data). For example, in the extreme case, if the concept can be 0/1-learned perfectly by an algorithm trained on an arbitrary bootstrap sample, then the cost-sensitive decision boundaries of computed by MetaCost will be the same as the 0/1-loss decision boundaries and can be far from being optimal. Although …
منابع مشابه
Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers
Accurate, well-calibrated estimates of class membership probabilities are needed in many supervised learning applications, in particular when a cost-sensitive decision must be made about examples with example-dependent costs. This paper presents simple but successful methods for obtaining calibrated probability estimates from decision tree and naive Bayesian classifiers. Using the large and cha...
متن کاملImproved Class Probability Estimates from Decision Tree Models
Decision tree models typically give good classification decisions but poor probability estimates. In many applications, it is important to have good probability estimates as well. This paper introduces a new algorithm, Bagged Lazy Option Trees (B-LOTs), for constructing decision trees and compares it to an alternative, Bagged Probability Estimation Trees (B-PETs). The quality of the class proba...
متن کاملAn application of Measurement error evaluation using latent class analysis
Latent class analysis (LCA) is a method of evaluating non sampling errors, especially measurement error in categorical data. Biemer (2011) introduced four latent class modeling approaches: probability model parameterization, log linear model, modified path model, and graphical model using path diagrams. These models are interchangeable. Latent class probability models express l...
متن کاملReducing multiclass to binary by coupling probability estimates
This paper presents a method for obtaining class membership probability estimates for multiclass classification problems by coupling the probability estimates produced by binary classifiers. This is an extension for arbitrary code matrices of a method due to Hastie and Tibshirani for pairwise coupling of probability estimates. Experimental results with Boosted Naive Bayes show that our method p...
متن کاملCertain Inequalities for a General Class of Analytic and Bi-univalent Functions
In this work, the subclass of the function class S of analytic and bi-univalent functions is defined and studied in the open unit disc. Estimates for initial coefficients of Taylor- Maclaurin series of bi-univalent functions belonging these class are obtained. By choosing the special values for parameters and functions it is shown that the class reduces to several earlier known classes of analy...
متن کامل